HOT aSAX: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

نویسندگان

  • Ninh D. Pham
  • Quang Loc Le
  • Tran Khanh Dang
چکیده

Finding discords in time series database is an important problem in the last decade due to its variety of real-world applications, including data cleansing, fault diagnostics, and financial data analysis. The best known approach to our knowledge is HOT SAX technique based on the equiprobable distribution of SAX representations of time series. This characteristic, however, is not preserved in the reduced-dimensionality literature, especially on the lack of Gaussian distribution datasets. In this paper, we introduce a k-means based algorithm for symbolic representations of time series called adaptive Symbolic Aggregate approXimation (aSAX) and propose HOT aSAX algorithm for time series discords discovery. Due to the clustered characteristic of aSAX words, our algorithm produces greater pruning power than the previous approach. Our empirical experiments with real-world time series datasets confirm the theoretical analyses as well as the efficiency of our approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Some Novel Heuristics for Finding the Most Unusual Time Series Subsequences

In this work, we introduce some novel heuristics which can enhance the efficiency of the Heuristic Discord Discovery (HDD) algorithm proposed by Keogh et al. for finding most unusual time series subsequences, called time series discords. Our new heuristics consist of a new discord measure function which helps to set up a range of alternative good orderings for the outer loop in the HDD algorith...

متن کامل

Finding the Unusual Medical Time Series: Algorithms and Applications

In this work we introduce the new problem of finding time series discords. Time series discords are subsequences of longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. While discords have many uses for data mining, they are particularly attractive as anomaly detectors b...

متن کامل

Proceedings of DMKD ' 03 8 th ACM SIGMOD Workshop on Research Issues in Data Mining and

Continuous data streams arise naturally, for example, in the installations of large telecom and Internet service providers where detailed usage information (Call-Detail-Records, SNMP/RMON packet-flow data, etc.) from different parts of the underlying network needs to be continuously collected and analyzed for interesting trends. Such environments raise a critical need for effective stream-proce...

متن کامل

A Symbolic Representation Method to Preserve the Characteristic Slope of Time Series

In recent years many studies have been proposed for knowledge discovery in time series. Most methods use some technique to transform raw data into another representation. Symbolic representations approaches have shown effectiveness in speedup processing and noise removal. The current most commonly used algorithm is the Symbolic Aggregate Approximation (SAX). However, SAX doesn’t preserve the sl...

متن کامل

A Novel Method for the Efficient Retrieval of Similar Multiparameter Physiologic Time Series Using Wavelet-Based Symbolic Representations

An important challenge in data mining is in identifying "similar" temporal patterns that may illuminate hidden information in a database of time series. We are actively engaged in the development of a temporal database of several thousand ICU patient records that contains time-varying physiologic measurements recorded over each patient's ICU stay. The discovery of multiparameter temporal patter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010